EXTREMAL DEPENDENCE ANALYSIS OF NETWORK SESSIONS 
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Abstract. We refine a stimulating study by Sarvotham et al. [2005] which highlighted the influence of peak 
transmission rate on network burstiness. From TCP packet headers, we amalgamate packets into sessions 
where each session is characterized by a 5-tuple (S, D, R, R v , T)=(total payload, duration, average transmis- 
sion rate, peak transmission rate, initiation time). After careful consideration, a new definition of peak rate 
is required. Unlike Sarvotham et al. [2005] who segmented sessions into two groups labelled alpha and beta, 
we segment into 10 sessions according to the empirical quantiles of the peak rate variable as a demonstration 
that the beta group is far from homogeneous. Our more refined segmentation reveals additional structure 
that is missed by segmentation into two groups. In each segment, we study the dependence structure of 
(S,D,R) and find that it varies across the groups. Furthermore, within each segment, session initiation 
times are well approximated by a Poisson process whereas this property does not hold for the data set taken 
as a whole. Therefore, we conclude that the peak rate level is important for understanding structure and for 
constructing accurate simulations of data in the wild. We outline a simple method of simulating network 
traffic based on our findings. 



1. Introduction 

Statistics on data networks show empirical features that are surprising by the standards of classical queuing 
theory. Two distinctive properties, which are called invariants in the network literature, are: 

• Heavy tails for quantities such as file sizes [Arlitt and Williamson, 1996, Willingcr and Paxson, 
1998, Lcland et al., 1994, Willingcr et al., 1998], transmission durations and transmission delays 
[Maulik et al., 2002, Rcsnick, 2003]. 

• Network traffic is bursty [Sarvotham et al., 2005], with rare but influential periods of high transmis- 
sion rate punctuating typical periods of modest activity. Burstiness is a somewhat vague concept 
but it is very important in order to understand network congestion. 

When studying burstiness, bursts are observed in the sequence of bytes-per-time or packets-per-time, 
which means that a window resolution is selected and the number of bytes or packets is counted over 
consecutive windows. Sarvotham et al. [2005] attempt to explain the causes of burstiness at the user-level. 
By the user-level, we mean the clusters of bytes that have the same source and destination network addresses, 
which we term sessions. As a simplification, associate a session with a user downloading a file, streaming 
media, or accessing websites; a more precise definition is given later. For each session, measurements are 
taken on the size or number of bytes transmitted, the duration of the transmission and the average transfer 
rate. If the primary objective is to explain sources of burstiness, the session peak rate arises as a natural 
additional variable of interest. The peak rate is computed as the maximum transfer rate over consecutive 
time slots. 

In order to explain the causes of burstiness at the user-level, Sarvotham et al. [2005] studied the depen- 
dence structure of quantities such as session size, duration and transfer rate. They concluded that it is 
useful to split the data into two groups according to the values of peak rate and consider the properties of 
each group. These two groups were called alpha sessions consisting of sessions whose peak rate is above a 
high quantile, and beta sessions, comprising the remaining traffic. Various criteria for segmenting into the 
two groups were considered but always the alpha group was thought of as sessions corresponding to "power 
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users" who transmit large files at large bandwidth, and the beta group was the remaining sessions. This 
analysis yielded the following: 

• A tiny alpha group relative to a huge beta group. In addition, it appeared that the alpha group was 
the major source of burstiness. 

• A dependence structure that is quite different in the alpha and beta groups, with approximate 
independence between rate and size for the alpha group and approximate independence between 
rate and duration for the beta group. To see this, Sarvotham ct al. [2005] measured dependence 
with correlations between the log- variables. 

We wondered if the large beta group should be treated as one homogeneous collection of users, espe- 
cially when one is happy to identify a small and distinct alpha group. Thus, we have investigated whether 
segmenting the beta group further produces meaningful information. 

Section 2 contains more details on the network traffic traces that we study, and gives the precise definition 
of session, size, duration, rate and peak rate. Historically [Crovella and Bestavros, 1997, Leland et al., 1994, 
Willinger et al., 1995, 1997], data collection was over finely resolved time intervals, and thus a natural 
definition of peak rate is based on computing the maximum transfer rate over consecutive time slots. We 
discuss in Section 2 that this definition may be flawed due to the choice of the time window resolution giving 
the peak rate undesired properties. Thus we propose our own definition of peak rate. 

In Section 3, we study the marginal distributions of size, duration and rate, and in Sections 4 and 5 we 
explore the dependence structure between these three variables. Throughout these sections, we depart from 
the approach of Sarvotham et al. [2005] by not just looking at the alpha and beta groups; instead, we have 
split the data into q groups of approximately equal size according to the quantilcs of peak rate. Thus, where 
we previously had a beta group, we now have q — 1 groups, whose peak rates are in a fixed quantile range. 
We show that the alpha/beta split is masking further structure and that it is important to take into account 
the explicit level of the peak rate. In Sections 4 and 5, we also review and use methods that are more suitable 
than correlation in the context of heavy tailed- modeling for studying the dependence of two variables. 

We also have considered in Section 6 whether session starting times can be described by a Poisson process. 
While several authors have shown that the process of packet arrivals to servers cannot be modeled under 
the framework of Poisson processes [Paxson and Floyd, 1995, Willinger et al., 1997, Willinger and Paxson, 
1998, Hohn et al., 2003], some argue that the network traffic is driven by independent human activity and 
thus justify the search for this underlying Poisson structure at higher levels of aggregation [Park et al., 
2006]. We have found that despite its inadequateness to describe the overall network traffic, a homogenous 
Poisson process is a good model for the session initiation times within each of the q groups produced by 
our segmentation of the overall traffic. In Section 7 we conclude with some final thoughts including a rough 
outline for simulation of data sets based on the aforementioned Poisson framework, and give possible lines 
of future study. 

2. Definitions 

2.1. Size S, duration D, and rate R of e2e sessions. Transmissions over a packet-switched computer 
network do not take place in a single piece, but rather in several small packets of data of bounded maximum 
size that depends on the specific network protocol. Thus, packet-level network traffic traces consist of records 
of packet headers, containing information of each individual packet such as arrival times to servers, number 
of bytes transmitted, source and destination network addresses, port numbers, network protocols, etc. As 
the packets travel across the network, routers and switches use the packet header information to move each 
packet to its correct destination. The two main goals of packet-switching are to optimize the utilization of 
available line bandwidth and to increase the robustness of communication [see e.g. Keshav, 1997]. 

The nature of the network data sets presents a challenge for modeling user behavior. Models such as 
a superimposition of on-off processes [Willinger et al., 1997, Sarvotham et al., 2005] or an infinite-source 
Poisson model [Guerin et al., 2003, Maulik et al., 2002, D'Auria and Resnick, 2006, 2008] require a way 
to reconstruct from the individual packets either a suitable on-period for the former model, or a suitable 
transmission session, for the latter. One possible approach [Sarvotham et al., 2005, Willinger et al., 1997] 
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Figure 1 . Top arrow Representation of a typical session; here each packet is depicted as an 
oval Middle arrow Sarvotham et al. [2005] 's division approach Bottom arrow Our proposed 
division according to the packet arrival times 

is to define an end to end (e2e) session, or briefly session, as a cluster of bytes with the same source and 
destination network addresses, such that the delay between any two successive packets in the cluster is less 
than a threshold t. A session plays the role of an arriving entity in an infinite-source Poisson model or the 
role of an on-period in an on-off process model. 
For each session, we have the following variables: 

• S represents the size, that is, the number of bytes transmitted. 

• D represents the duration, computed as the difference in seconds between the arrival times of the 
first and last packets in the session. 

• R represents the average transfer rate, namely S/D. 

Note that R is not defined for single-packet sessions, for which D by definition is zero. More generally, 
sessions with very small D may also be problematic to handle. For instance, it would be hard to believe that 
a session sending only two packets back-to-back has an R that equals the line bandwidth. In order to avoid 
this issue, for our analysis we ignore sessions with D < 100ms. See Zhang et al. [2002] for related comments. 

2.2. Predictors of burstiness. In addition to S, D and R, Sarvotham et al. [2005] consider a fourth 
quantity which serves as an explanatory variable for burstiness, namely the session's maximum input in con- 
secutive time windows. A closely related variable arises by considering the session's peak rate in consecutive 
intervals. In what follows, we review the properties of these two variables and show that they are not ideal 
for describing burstiness. Therefore, we will propose a different definition of peak rate. 

2.2.1. The S-maximum input. Fix a small S > and divide each session in I subintcrvals of length 5, where 
I = \D/S\ (see Fig. 1, Top and Middle). For i = 1, . . . , I, define the following auxiliary variables: 

• Bi represents the number of bytes transmitted over the zth subinterval of the session. 

• Tj represents the duration of the ith subinterval. For i = 1, — 1, we have Tj = 5. However, 
notice that T t = D - (I - 1)8. 

The S-maximum input of the session is defined as Ig = VILi Bl- This Ig is the original variable used by 
Sarvotham et al. [2005] . 

2.2.2. The S-peak rate. If the goal is to explain burstiness, a natural alternative to maximum input is to 
consider rates in consecutive time subintervals, rather than inputs. This yields a closely related predictor: 
the (5-peak rate. The definition of the 5-peak rate for a session, denoted as Rg, relies on the Sarvotham et al. 
[2005] 's division of the session (see Fig. 1). We define R s = \/" =1 BJT,. 
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Observe the following properties for a session: 

(i) £r=i^ = s; 

(ii) T: =1 T t = D; 

(hi) Rs > R. To see this, note 

V" V'' <\/B l/Tl = R s . 
u 2^ l =i 1 i i=1 

A quick analysis shows that the last property does not necessarily hold if we do not carefully define 
the duration of the last subinterval T n as above, but instead set T n = 5. For a numerical example, let 
5=1 and consider a session with n = 2,Bi = B 2 = 1,D = 1.1. Using the wrong definition T n = 5 
yields 2~i = T 2 = 1, hence the average transfer rate R = [B\ + B 2 )/D = 2/1.1 but the peak transfer rate 
R s =Taax{B 1 /T 1 ,B 2 /T 2 } = 1 < 2/(1.1). 

While both Is and Rs appear to be natural predictors of burstiness, they both possess undesirable prop- 
erties. They both depend on the parameter 5 which is not an intrinsic characteristic of the session. As 5 J, 0, 
many consecutive subintervals thus have a single packet, as in Fig. 1. Therefore, as 5 | 0, 

• Ig — > maximum packet size, which precludes Is from being a useful measure of burstiness. 

• Rs — > oo, implying that Rs is much greater than the line capacity as 5 — > 0. In fact, for this limit to 
hold, it is sufficient that a single subinterval has a few packets, which suggests that the convergence 
rate of Rs — » oo is greater than the convergence rate of Is -^maximum packet size. This implies that 
we still may have unreasonably large Rss for relatively large 5s. Therefore, the interpretation of Rs 
as an actual peak transfer rate becomes problematic. 

Owing to the drawbacks of the previous two definitions, we propose our own definition of peak rate. 

2.2.3. Peak rate R v . Suppose a session has p packets (see Fig. 1, Bottom). Consider the following variables. 

• B[ represents the number of bytes of the ith packet. 

• T[ represents the interarrival time of the ith and (i + l)th packets, i = 1, . . . ,p— 1. 
For k = 2, . . . ,p, we define the peak rate of order k, denoted by R^ k \ as 

p—k+l Spj + k-l r>l 

* (fe) = V (2-D 

3=1 2^ii=j i 

In the above definition, the quotient measures the actual transfer rate of a stream of bytes consisting of 
k consecutive packets. For a session consisting of p packets, there are p — k + l streams of k consecutive 
packets, hence R^ is a measure of the actual peak transfer rate when only k consecutive packets are taken 
into account. We then define the peak rate as 

p 

R v =\f i? (fc) . (2.2) 

k=2 

Notice that R v > R.W = R. 

As opposed to Is and Rs, i? v does not depend on external parameters such as 5, and thus it is an 
intrinsic characteristic of a session. In addition, i? v inherits the interpretation of R^ and therefore it may 
be interpreted itself as a measure of the actual maximum transfer rate taken over all possible streams of 
consecutive packets. A drawback of our definition is that i? v is complex to analyze mathematically. 

2.3. The data set. We present our results for a network trace captured at the University of Auckland be- 
tween December 7 and 8, 1999, which was publicly available as of May 2009 through the National Laboratory 
for Applied Network Research website at http://pma.nlanr.net/Special/index.html. Auckland's data 
set is a collection of GPS-synchronized traces, where all non-IP traffic has been discarded and only TCP, 
UDP and ICMP traffic is present in the trace. We have taken the part of the trace corresponding exclusively 
to incoming TCP traffic sent on December 8, 1999, between 3 and 4 p.m. We have found that our results 
hold for several other data sets. See Section 7 for more details about this and other data traces. 
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The raw data consists of 1,177,497 packet headers, from which we construct 44,136 sessions using a 
threshold between sessions of t — 2s and considering only those sessions with D > 100ms (as explained in 
the last paragraph of Section 2.1). We have found similar results for various choices of thresholds between 
sessions, including t = 0.1, 0.5, 10, 60, 100s, but here we only present our results for t = 2s. 

In addition, for each session we have peak rate RV and starting time IV Thus, the data set has the form 
{((S^, Di, Ri), Ry , Tj); 1 < i < 44, 136}, that is, a set of 5-tuples, but with the above notation we emphasize 
that the primary interest is placed on the dependence structure of triplet (Si, Di, Ri). 

We split these sessions into 10 groups of approximately equal size according to the empirical deciles of i? v . 
Thus, all the sessions in the gth group, g = 1, ... ,10, have R v is in a fixed decile range, (10(g — 1)%, 10</%]. 
Hence we term the group of sessions "the 5th decile group" , g = 1, . . . , 10. Therefore, where Sarvotham et al. 
[2005] had alpha and beta groups, we now have a more refined segmentation. 

In the remainder of the paper, we show that this refined split reveals features that are hidden by an 
elementary alpha/beta split. 

3. Marginal distributions of S, D and R. 

In what follows, we analyze the marginal distributions of S, D and R in the 10 different decile groups to 
check for the presence of one stylized fact in network data sets, namely, heavy tails. We found that S and 
D have heavy tails for all the different decile groups, but not R. Let us start by discussing background. 

3.1. Heavy tails and maximal domains of attraction. A positive random variable Y has heavy tails if 
its distribution function F satisfies 

l-F(y)=F(y)=y- 1 ^L(y), (3.1) 

where L is a slowly varying function and 7 > 0. We also say that F is heavy tailed and we call 7 the shape 
parameter. When F satisfies Eq. 3.1, it is also said to have regularly varying tails with tail index I/7. 
Equation 3.1 is equivalent to the existence of a sequence b n — > 00 such that 

A c^(-), (3.2) 

vaguely in M+(0, 00], the space of Radon measures on (0, 00]. Here ^ 7 (x,oo] = x" 1 ! 1 and c > 0. Equation 
3.2 is important for generalizing the concept of heavy tailed distributions to higher dimensions. 

An important concept is maximal domains of attraction. Suppose {Yi\i > 1} is iid with common distri- 
bution F. The distribution F is in the maximal domain of attraction of the extreme value distribution G 7 , 
denoted F G 2?(G 7 ), if there exist sequences a n > and b n G R such that for y G E' 7 ^ = {y G R : 1+72/ > 0}: 

= G 7 (y):=cxp{-(l + 72/ )- 1 ^}. (3.3) 

This is equivalent to the existence of functions a(t) > and b(t) G R such that for y G E^: 

lim tP [Yt > a(t)y + bit)} = - \ogGJy). (3.4) 

t— >oo 

The class of distributions 2?(G 7 ) is known as the Frechet domain when 7 > 0, Gumbel domain when 7 = 
and Weibull domain when 7 < 0. Equation 3.3 is also known as the extreme value condition. 
For 7 > 0, 

F(y) = y- lh L{y) ^Fe 2?(G 7 ), (3.5) 

for some slowly varying L. In other words, a necessary and sufficient condition for a distribution to be heavy 
tailed is that it is in the Frechet class [dc Haan and Ferreira, 2006, Resnick, 1987]. 

Thus, the focus will be placed on checking the Frechet domain condition. First, we check whether or not 
the marginals of (S, D, R) are in some domain of attraction. If that turns out to be the case, we proceed to 
check for the specific domain class. 

3.2. Domain of attraction diagnostics. 
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3.2.1. Excesses over high thresholds. One common method [Davison and Smith, 1990, Bcirlant et al., 2004, 
Coles, 2001, Reiss and Thomas, 2007, McNeil et al., 2005, de Haan and Ferreira, 2006] to check the extreme 
value condition, given by Eq. 3.3, relies on threshold excesses, using all data that are "extreme" in the sense 
that they exceed a particular designated high level. 

More precisely, consider a random variable Y with distribution function F. Given realizations of Y, say 
j/i, . . . , y n and a threshold u, we call yj an exceedance over u if yj > u, and in such case, yj — u is called the 
excess. Denote the excess distribution over the threshold ti as F„, i.e. 

F u (y)=V[Y-u<y\Y>u}, 

for all < y < yF — u, where yp < oo is the right endpoint of F. The connection with domains of attraction 
is that 

F G V(Gr i ) & Um sup \F u (y) - GPD lMu) {y)\ = 0, for some (3(u) > 0. (3.6) 

U^y F 0<y<y F - U 

Here GPD^p, with 7 G R, j3 > is the generalized Pareto distribution , defined as 

GPD lJ3 {y) :=l-(l + 1 y/P)- lh , 

for y > when 7 > and < y < —[3/j when 7 < 0. See Pickands [1975], Balkema and de Haan [1974], 
de Haan and Ferreira [2006]. 

For a distribution F, the method of excesses over high thresholds (also referred to as peaks over thresholds 
or POT) assumes equality in Eq. 3.6 holds for a high threshold u, without need to take a limit, meaning 
that the excess distribution over such u equals a generalized Pareto distribution. Sec Embrechts et al. [1997], 
Coles [2001], Reiss and Thomas [2007], de Haan and Ferreira [2006]. Thus, suppose Y\, ■ ■ ■ , Y n are iid with 
common distribution F and let Yi :n < Y^.n < ■ • • < Y n:n be the order statistics. Fix a high threshold 
u = Y n -k-.n as the (k + l)th largest statistic, and fit a GPD lt p model to Y n —k+i-.n — u,..., Y n:n — u. Then the 
evidence supports F G P(G 7 ) if and only if for some high threshold u that fit is adequate. For informally 
assessing the goodness of fit, we compare via quantile-quantilc (QQ) plots the sample quantiles, namely 
Y n -k+i-.n — u, . . . , Y n . n — u, against the theoretical quantiles given by the GPD fit. It is not difficult to show 
that Z ~ GPD 1 _p is equivalent to the statement that log (1 + ^Z/0)) /j ~ exp(l), and so we draw QQ plots 
in this latter scale after estimating 7, /3 by means of, say, maximum likelihood. 

Using the POT method, we found no evidence against Fs,Fp, G 2?(G 7 ) for all the 10 decile groups. 
Typical QQ plots are those corresponding to the GPD lt p fit for the excess of S and D in the 10th decile 
group, shown in Fig. 2, Upper left and Upper right panels, respectively. Both plots exhibit an almost perfect 
straight line. We also found that the QQ plots corresponding to the excesses of S and D in all the other 
decile groups exhibit straight line trends, showing thus no evidence against satisfaction of the extreme value 
condition. 

Similarly, Fig. 2 Lower left panel exhibits the QQ plot of the GPD 7t p fit for the excess of R in the 10th 
decile group, which shows no evidence against Fr G P(G 7 ). However, for all the other decile groups, we 
found evidence against Fr G 2?(G 7 ). For instance, a QQ plot of the GPD 7i| g fit for the excess of R in the 
4th decile group is shown in Fig. 2 Lower right panel, exhibiting a major departure from the straight line. 
We also found no straight line trend in the rest of the QQ plots of the GPD lyl s fit for the excess of R in the 
lower nine decile groups. 

3.2.2. Formal tests of domain of attraction. Recently, two formal methods for testing F G 2?(G 7 ) have been 
derived by Dietrich et al. [2002] and Drees et al. [2006]. Both tests are similar in that the two are based 
on quantile function versions of the well known Cramer von-Mises and Anderson-Darling test statistics [see 
e.g. Lehmann and Romano, 2005], respectively, for checking the goodness of fit of a given distribution. In 
addition, both tests assume a so called second order condition which is difficult to check in practice. A 
follow-up study by Husler and Li [2006] examines the two tests' error and power by simulations. A thorough 
discussion of these tests and the second order condition is provided by de Haan and Ferreira [2006]. Here 
we review the method proposed in Dietrich et al. [2002] and use it to test for the extreme value condition. 
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Figure 2. GPD QQ-plots of excesses; a number k = 450 of upper order statistics is used 
for each fit Upper left Size in the 10th decile group Upper right Duration in the 10th decile 
group Lower left Rate in the 10th decile group Lower right Rate in the 4th decile group 

Dietrich et al. [2002] state that if F £ V{G 1 ) for some 7 £ K and also if F satisfies an additional second 
order tail condition [for the details of this condition, see Dietrich et al., 2002, equation (4)], then: 

Jo V 7+ J- J 

^E, := /V(l - y_)(t^- l W(t) - W(l)) (1 - 7 _) 2 ^__lp 7 _ 
Jo v 7- 

+ - ^ - - Ry_ + (1 -f-)Ry_ J s-~ ( -- 1 \ogsdsyt 2 dt, (3.7) 

as k — > 00, k/n — > 0,n — > 00 and ^^^^(n/fc) — > 0, where A is related to the second order condition, 
7 + = max{7, 0} and 7_ = min{7, 0}, W is a Brownian motion, P~ l _ and i? 7 _ are some integrals involving 
W [for the details, see Dietrich et al., 2002, dc Haan and Ferreira, 2006], and 7+ and j- are consistent 
estimators of the corresponding parameters. 

In practice, Dietrich et al. [2002] recommend replacing 7 by its estimate. Therefore, based on Eq. 3.7, we 
could test 

Ho : F £ 2?(G 7 ),7 £ K + second order condition 
by first determining the corresponding quantile Qi_ a , 7 of the distribution E^ and then comparing it with 
the value of Ek yU . If Ek >n > Qi-a,y we reject Hq with asymptotic type I error a and otherwise there is 
no evidence to reject Hq. Notice that this is a one-sided test of hypothesis, but a two-sided test could be 
performed in a similar fashion. 
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A drawback of this test is that we must include in Hq the additional second order condition, which is 
difficult to check in practice. While many common distributions satisfy the second order condition, including 
the normal, stable, Cauchy, log-Gamma, among others, the Pareto distribution is a notable example of a 
distribution which does not satisfy the second order condition. 

In addition, there are two other drawbacks of this test. First, it is based on the usual setting of acceptance- 
rejection regions, and thus it provides no measure of the strength of rejection of Hq. While this typically 
is addressed with the equivalent setting based on p-values, the limit distribution in Eq. 3.7 is analytically 
intractable and so are the p-values. Second, since the limit in Eq. 3.7 depends on k, the conclusions of the 
test are also highly dependent on the choice of k. 

Dietrich et al. [2002] state a corollary in which the limit distribution in Eq. 3.7 is greatly simplified 
by observing that 7_ — for all 7 > 0. This result is easier to apply. Under the assumption that F G 
P(G 7 ),7 > and the second order condition, Eq. 3.7 becomes: 



Suppose Ei, ... , En is a random sample of E, that we can obtain by simulation since the limit distribution 
in Eq. 3.8 is free of unknown parameters. Based on Eq. 3.8, wc propose the following test for 



If p(k) < a, then reject Hq with an asymptotic type I error a, otherwise there is no evidence to re- 
ject Hq. With this method, the p-values give a measure of the strength of rejection of Ho. Further- 
more, we can check the stability of the conclusion of the test as a function of k by constructing the plot 
{{k,p(k)); k in an appropriate range}. The range of values of k is chosen to accommodate for the limit in 
Eq. 3.8, namely k — > 00, k/n — > 0, n — > 00. For example, Husler and Li [2006] found via simulations that 
the power of the test in Dietrich et al. [2002] appears to be high for k such that k/n w 0.05, at least for 
their various choices of F. To compute we use % iU given by the consistent Hill estimator [Hill, 1975] 

or perhaps maximum likelihood if we suspect 7 = 0. 

We use this method with an asymptotic nominal type I error a = 0.05. We found no evidence against 
Fs,Fd G P(G 7 ),7 > for all the 10 decile groups. Typical plots of the p-values p(k) for the variables S 
and D are those corresponding to the 10th decile group, shown in Fig. 3 Upper left and Upper right panels, 
respectively. Both plots exhibit that p(k) > 0.05 for a wide range of values of k. Wc also found here that the 
plots {(k,p(k))} corresponding to all the other decile groups show no evidence against Fs, Fr> G P(G 7 ), 7 > 0. 
Coupled with the evidence from the QQ plots, we believe 7 > 0. 

Similarly, Fig. 3 Lower left exhibits the plot {(k,p(k))} corresponding to the distribution of R in the 
10th decile group. Once again we found that p(k) > a for a wide range of values of k, thus showing 
no evidence against Fr G P(G 7 ),7 > 0. However, we did find evidence against Hq ■ Fr G 2?(G 7 ),7 > 
+ second order condition for all the lower nine decile groups. A typical example of the plot {(k,p(k))} in 
these latter groups is exhibited in Fig. 3 Lower right for the 4th decile group, which shows that p{k) are 
significantly lower than 0.05 across a wide range of k values. 

Therefore, for the lowest nine decile groups, we reject Hq : Fr € 2?(G 7 ),7 > 0+second order condi- 
tion. One possible alternative is that Fr G 2?(G 7 ),7 < 0, or equivalently, that xp R < 00 and F( X f r -r)-~ 1 G 
X>(G_i/ 7 ) [dc Haan and Ferreira, 2006, Resnick, 1987]. Hence, by applying the above test to H : Fr XF G 
23(G 7 ), 7 > 0+second order condition, we dropped the possibility of this new Hq because the p(k) < 0.05 for 




(3.8) 



Hq : F G T>(G~f), 7 > + second order condition. 
Estimate a (one-sided) p- value p(k) = P(E > Ek, n ) as the relative frequency 
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Figure 3. Plots of p- values as a funtion of k for the test of the extreme value condition 
for the distribution of the following variables; a horizontal dashed line is drawn at a = 0.05 
Upper left Size in the 10th decile group Upper right Duration in the 10th decile group Lower 
left Rate in the 10th decile group Lower right Rate in the 4th decile group 

a wide range of values of k for the lower nine decile groups. Here we estimated xf r with R n - n + l/n' for a 
high value of n' ■ 

This last result left us with two possibilities. Either Fr 2?(G 7 ),7 G R or simply, the additional second 
order condition does not apply for Fr (and in this case we still may have Fr G D(G 7 ), 7 > 0). As previously 
mentioned, the second order condition is difficult to check in practice without any prior knowledge of the 
distribution function, and thus the hypothesis test fails to provide a clear description of the distribution Fr. 
Nevertheless, in Section 5 we are able to say something about Fr. 

3.3. Estimation. In Section 3.2.2 we showed that Fs,Fd <G 2?(G 7 ),7 > for all the decile groups and 
Fr G 2?(G 7 ), 7 > only for the 10th decile group. We now proceed to the estimation of the shape parameter 
7 for these distributions. 

The Hill estimator is a popular estimator of 7 [Hill, 1975, Csorgo et al., 1985, Davis and Resnick, 1984, 
dc Haan and Resnick, 1998, Hall, 1982]. The Hill estimator based on the k largest order statistics is 

1 ™ Y 

%"- = T iog v V ' n » k = l,...,n-l. (3.9) 

™ . j 1 1 * n — k:n 

i—n — k-\-l 

For F G P(G 7 ),7 > 0, the Hill estimator 7^ is a consistent estimator of 7. Furthermore, under an 
additional second order condition of the type needed in Eq. 3.7: 

Vfc(7fc,«-7) -iiV(0, 7 2 ), (3.10) 

so both consistency and asymptotic normality hold as k — * oo,fc/n — * 0, and n — > oo. See, for example, 
de Haan and Ferreira [2006], Resnick [2007], Geluk et al. [1997], de Haan and Resnick [1998], Peng [1998], 
de Haan and Peng [1998], Mason and Turova [1994]. 
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Figure 4. Hill plots, with 95% confidence bands in dashed lines, corresponding to the shape 
parameter 7 of the variables in the 10th decile group; the values on top give thresholds and 
the values on bottom indicate the number of upper order statistics Upper left Size Upper 
right Duration Lower left Rate 

The Hill estimator depends on the number k of upper-order statistics and so in practice, we make a Hill 
plot {(fc,7fc,n); k > 1} and pick a value of jk,n for which the graph looks stable. Figure 4 exhibits Hill plots 
for the shape parameter 7 of the distribution of S, D and R for the 10th decile group. The three plots show 
stable regimes for 7 around k = 450. We also found stability in the Hill plots for the shape parameter 7 of 
the distribution of S, D in all the other decile groups. 

Table 1 contains the Hill estimates of 7 for our data set, along with estimates of the asymptotic standard 
error based on Eq. 3.10. A number k of upper order statistics was chosen individually for each variable 
and each decile group based on the corresponding Hill plots. For most decile groups, we used k as 400 
(k/n « 0.05), as suggested by the empirical study by Hiisler and Li [2006]. Notice that the majority of the 
estimates are greater than 0.5, which implies that the corresponding distributions have infinite variances. 

We make some final comments on our choice of the Hill estimator against the Pickands estimator [Pickands, 
1975, Dekkcrs and de Haan, 1989]. In Section 3.2.2 we only showed that 7 > which is a weaker assumption 
than the requirement of 7 > for the Hill estimator. Unlike the Hill estimator, the Pickands estimator is 
more robust in the sense that it docs not need 7 > 0. However, the Pickands plots proved to be very unstable 
for our data set. 

4. Dependence structure of (S, D, R) when the three variables have heavy tails 

We now analyze the dependence structure of the triplet (S, D, R) across the 10 different decile groups. 
Since S = DR, at most two of the three components in (S, D, R) may be independent. This makes it 
reasonable to focus on the analysis of each pair of variables. We concentrate on the pairs in (S, D, R) with 
heavy tailed marginals and first focus on the dependence structure of (S, D) across the 10 deciles groups. We 
later study the dependence structure of both (R, S) and (R, D), but only in the 10th decile group. For the 
other decile groups, we found strong evidence suggesting R does not have heavy tails, and thus we leave this 
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Table 1 . Summary of Hill estimates with asymptotic standard errors for the shape param- 
eter of S, D and R 



Decile group 


7s 


s.e. 


ID 


s.e. 


7h 


s.e. 


1 


0.56 


0.056 


0.60 


0.028 






2 


0.55 


0.061 


0.47 


0.023 






3 


0.62 


0.044 


0.63 


0.034 






4 


0.62 


0.036 


0.62 


0.029 






5 


0.61 


0.035 


0.55 


0.029 






6 


0.69 


0.040 


0.55 


0.028 






7 


0.88 


0.042 


0.73 


0.037 






8 


0.77 


0.045 


0.71 


0.033 






9 


0.70 


0.037 


0.69 


0.032 






10 


0.73 


0.034 


0.68 


0.032 


0.58 


0.027 



case for Section 5. Our finer segmentation into the deciles of R v reveals hidden features in an alpha/beta 
split, and therefore it is important to take into account the explicit level of R v . 

One way to assess the dependence structure is with sample cross-correlations. In heavy-tailed modeling, 
although the sample correlations may always be computed, there is no guarantee that the theoretical corre- 
lations exist. Recall Table 1 shows that most estimates of 7 for S, D and R are greater than 0.5, and thus 
correlations do not exist in these instances. Moreover, correlation is a crude summary of dependence that 
is most informative between jointly normal variables, and that certainly docs not distinguish between the 
dependence between large values and the dependence between small values. In the context of data networks, 
the likelihood of various simultaneous large values of (S, D, R) may be important for understanding bursti- 
ness. For example, if large values of D arc likely to occur simultaneously with large values of R, then we 
can expect a network that is prone to congestion. In this situation, a scatterplot {(-Di, Ri)} would be mostly 
concentrated in the interior of the first quadrant of M 2 . On the other hand, if large values of one variable 
are not likely to occur with large values of the other one, the same scatterplot would be mostly concentrated 
on the axes. 

Understanding network behavior requires a description of the extremal dependence of S, D and R and 
this extremal dependence is conveniently summarized by the spectral measure. See de Haan and Resnick 
[1977], de Haan and Ferreira [2006], Resnick [2007, 2008]. We begin by discussing important concepts. 



4.1. Bivariate regular variation and the spectral measure. Let Z be a random vector on E := 
[0,oo] 2 \ {(0,0)}, with distribution function F. The tail of F is bivariate regularly varying if there exist 
a function b(t) — * 00 and a Radon measure v on E, such that 



Z 

W) 



(4.1) 



vaguely in E. Notice that this is a straightforward generalization of the univariate case as formulated in Eq. 
3.2. 

In terms of dependence structure of the components of Z, it is often illuminating to consider the equivalent 
formulation of Eq. 4.1 that arises by transforming to polar coordinates. We define the polar coordinate 
transform of Z = {X, Y) e E by 



(N, 9) = POLAR(Z) := ||Z||, 



IZI 



(4.2) 



where from this point on we use the L\ norm given by ||Z|| = X + Y. 
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Bivariatc regular variation as formulated in Eq. 4.1 is equivalent to the existence of a function b(t) — > oo 
and a probability measure § on K + , where K + = {z G E; |z| = 1}. such that 

cis 7 x §(•), (4.3) 

vaguely in M + ((0, oo] x Here i> 7 (r, oo] = r -1 / 7 , r > 0, and c > and, as usual, M+((0, oo] x K+) are 

the positive Radon measures on (0, oo] x H + . Since there is a natural bijection between H + and [0, 1], namely 

1T§IT TfWT' WC Can anc ^ w ^ assume § is defined on [0, 1]. 

The probability measure S is known as the limit or spectral measure, and it quantifies the dependence 
structure among the components of the bivariate random vector. Consider the following two cases which 
represent opposite ends of the dependence spectrum [Coles, 2001, Resnick, 2007]. Suppose Z = (X, Y) is a 
random vector in E that is bivariate regularly varying as in Eq. 4.3. 

• On one end of the dependence spectrum, if Z = (X, Y) and X and Y are iid, then § concentrates 
on 9 G {0, 1}, corresponding to the axis x = and y = 0, respectively Conversely, if § concentrates 
on 9 G {0, 1}, then there is negligible probability that both X and Y are simultaneously large, and 
this behavior is called asymptotic independence. 

• On the other end of the dependence spectrum, if Z = (X, Y) and X = Y, the two components are 
fully dependent and S concentrates on 9 = 1/2. This behavior is called asymptotic full dependence. 
If § concentrates on the interior of the E, then we can expect X and Y to be highly dependent. 

Since § could be any probability measure, there are infinitely many kinds of dependence structures between 
the two extreme cases discussed above. Therefore, we focus on the estimation of the spectral measure § as 
means of discerning the asymptotic dependence between two random variables with heavy tails. 



Mt(-) :=*P 



— 



,e 



4.2. Estimation of the spectral measure § by the antiranks method. For estimating S, we use the 
following result. If {Zj, i = 1, . . . , n} is a random sample of iid vectors in E whose common distribution F 

is bivariate regularly varying as in Eq. 4.1, then for (iVj, 0j) := I HZ*. 



1 ™ 



cv 1 x §, 



(4.4) 



as k — > oo, k/n — > 0, and n — > oo. Equation 4.4 provides a consistent estimator of S, since 

E^ 1 «(w./6(f),e,)((l.<»] x 



Er=i e Jv,/&(t)(( 1 '°°]) 



s(-), 



(4.5) 



provided b(t) = t. See Huang [1992], dc Haan and Ferreira [2006], Resnick [2007]. 

However, the phrasing of bivariate regular variation in Eq. 4.1 requires scaling the two components of 
Z = (X, Y) by the same factor, which implies that 



nP 



X 



CiiAy(-), nl 



Y 

T G 

Or, 



C2f 7 (')i 



for Cj > and j = 1, 2. When ci > and C2 > 0, both X and Y have the same shape parameters and their 
distributions are tail equivalent [Resnick, 1971]; this is the standard regular variation case. 

In practice, we rarely encounter bivariate heavy tailed data for which the 7 of each component is the 
same. For example, consider the bivariate random vector (S,D). For many decile groups, observe in Table 
1 that 75 7^ 7£>. Hence, in order to estimate S, one possible approach is to transform the data to the 
standard case using the antiranks method [Huang, 1992, de Haan and Ferreira, 2006, Resnick, 2007]. This 
procedure does not require estimation of the 7s, yet it achieves transformation to the standard case, thus 
allowing the estimation of S. However, the transformation destroys the iid property of the sample and a 
more sophisticated asymptotic analysis is required. 
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We proceed as follows. For iid bivariate data {(Xi,Yi),l < i < n} from a distribution in a domain of 
attraction, define the marginal antiranks by 

n n 

1=1 1=1 

that is, rj is the number of jth components that are as large as the jth component of the ith observation. 
Then: 

• Transform the data {(Xi, Yi), 1 < i < n} using the antirank transform: 

{Z 4 ; 1 < i < n} = {{k/rf\k/rf ] ); 1 < % < n}. 

• Apply the polar coordinate transformation 

POLAR Ql).^ = (Ni,k,Qi,k). 

• Estimate S with 

3 M ^ti e W, fc ,e„ fc )((l,oo] x ■) 

3>k,n{-) = F^n 77\ in =>" MO- (4.6) 

See Resnick [2007], de Haan and Rcsnick [1993]. 
The interpretation of Eq. 4.6 is that the empirical probability measure of those 0s whose radius TV is 
greater than 1 consistently approximates §. Hence, we should get a good estimate of S by fitting an adequate 
distribution to the points {O^; > 1}, for a suitable k (see Section 4.3). Even though we do not know 
that § has a density, often a density estimate is more striking than a distribution function estimate. For 
example, a mode in the density at 1/2 reveals a tendency towards asymptotic dependence, but modes in the 
density at and 1 exhibit a tendency towards asymptotic independence. 

4.3. Parametric estimation of the spectral density of (S,D). Using the antiranks method described 
above, we transform the points {(Si, DP)} for each decile group separately. Figure 5 shows histograms of the 
transformed points {Oi,fc; -ZV^fc > 1}. The histograms suggest that the strength of the dependence between 
S and D decreases as R v increases, since there is increasing mass towards the ends of the interval [0, 1] as 
the i? v goes up. It is certainly apparent that asymptotic independence does not hold in any decile group. 

In order to assess the significance of this apparent trend, we review a parametric estimator of the spectral 
density. The histograms in Fig. 5 show that the spectral density is reasonably symmetric for each decile 
group, suggesting that the logistic family may be an appropriate parametric model. The logistic family is a 
symmetric model for the spectral density [Coles, 2001], defined by 

h(t) = l(l--l)t- 1 -T{l-t)- 1 -^[t-^ + (l-ty^f- 2 , 0<t<l, (4.7) 



2 \4i 

with a single parameter ip £ (0, 1). For ip < 0.5, h is unimodal, whereas for increasingly large values 
of ip > 0.5, the density places greater mass towards the ends of the interval [0,1]. In fact, asymptotic 
independence is obtained as tp — > 1, and perfect dependence is obtained as ip —> 0. This allows us to quantify 
the effect of R v on the dependence between S and D. 

We first fit the model in Eq. 4.7 to the data {Qi,k', Ni.k > 1} within each R v decile group by maximum 
likelihood estimation. The log- likelihood function of ip based on t\ , . . . , t n is 

n 

+ ]>> - 2) log(trV* + (1 - U)- 1 ^), (4.8) 
1=1 

which we maximize numerically for < tp < 1. By considering ip as a function of k, we choose a value of 
k around which the estimate of ip looks stable. Figure 5 shows that the logistic estimates of the spectral 




Figure 5. Logistic estimates of the spectral density of (S,D) superimposed on the his- 
tograms of the points {©j,fe; Ni k > 1}: starting with the 1st decile group from the upper 
left and going left to right by row 
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1 1 1 1 r 

6 8 10 12 14 



log(Peak Rate) 

Figure 6. Parameter -0 as a function of log(i? v ) and three linear models of the form given 
by Eq. 4.9 superimposed: (solid line) link function in Eq. 4.10, (dashed line) logit link, 
(dotted line) probit link. The logit and probit links are almost indistinguishable in the range 
of the data 

density are in close agreement to the histogram of the points. On top of each plot, we indicate the maximum 
likelihood estimates of tp and the choice of k in the corresponding decile group. The estimates of ip confirm 
a decline in dependence between S and D as the decile group increases, as measured by increasing estimates 
of ip. 

We now study the form of this decline by fitting a global trend model simultaneously to all the peak 
rate decile groups, using the same data (antirank transformed, polar coordinate transformed, thresholded) 
employed for the separate analyses. In this joint study, the parameter ip in Eq. 4.8 is a function of i? v as 
follows: 

<T x (V0 =/3 + /?ilog(i? v ), (4.9) 

where g is a link function. The used of log(i? v ) instead of i? v is a common technique in linear models to 
improve fit. Since ij) G (0, 1), natural choices of g are the logit and the probit functions. However, as shown 
in Fig. 6, the link function 

1 + e x 

is more adequate than the usual logit or probit links. The link given by Eq. 4.10 is very similar to the logit 
link, but confines the possible values of ip to the interval (0, 0.5) and is suggested by the fact that in Fig. 5 
the histograms of the points {©ifc;iVifc > 1} put all mass around an apparent mode at 0.5. This behavior 
corresponds to ip < 0.5 as previously stated. 

Figure 6 exhibits in various ways the logistic parameter ip as a function of peak rate using Eq. 4.9. First, 
we plot the points V = {{meS l \ip^)\ 1 < i < 10}, where med^ is the median of the logi? v variable for 
sessions in the ith decile group and ip^ is the maximum likelihood estimated logistic parameter in the ith 
decile group. In Fig. 6, we superimpose on V the estimated Eq. 4.9 using the link function in Eq. 4.10, 
showing that the goodness of fit of the model in Eq. 4.9 is quite reasonable. 
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Table 2. Summary of estimated linear model given by Eqs. 4.9 and 4.10 





Estimated parameter 


Bootstrap standard errors 




-1.432 


0.219 


A 


0.288 


0.127 



To assess the effect of R v on the dependence structure of (S,D), we focus on j3i. Observe that Eq. 
4.8 gives the log-likelihood of the model for independent observations. Since iV^fc > 1} is not an 

independent sample due to the antirank transform, the classical maximum likelihood theory is not strictly 
applicable. Hence, to quickly compute the standard error of /3i we bootstrap the whole model. However, 
several authors have shown in the context of heavy-tailed phenomena that if the original sample is of size n, 
then the bootstrap sample size m should be of smaller order for asymptotics to work as desired [Athreya, 1987, 
Deheuvels et al., 1993, Gine and Zinn, 1989, Hall, 1990, Resnick, 2007]. In connection with the estimation of 
the spectral measure, the bootstrap procedure works as long as m — oo, m/n — > and n — x. Therefore, 
a bootstrap procedure to estimate the standard error of /3i is constructed as follows: 

(i) From the original sample {(Si, Di, J?/); 1 < i < 44136}, a bootstrap sample {(Sf, D|, -R/*); 1 < i < 
10000} is obtained. Notice that the bootstrap sample size is of smaller order than the original sample 
size. Our choice of m = 10000 owes to the need of having enough data points to perform estimation. 
However, the choice of the bootstrap sample size is as tricky as choosing the threshold k used in, say, 
Hill estimation. Hence, this may be subject of further study. 

(ii) Split the bootstrap sample {(S*, D*, R^*); 1 < i < 10000} into 10 groups according to the quantiles of 
RV*. 

(iii) Within each bootstrap decile group, transform the data {(S*,D*); 1 < i < 1000} using the antirank 
transform and then transform to polar coordinates to obtain {©* fe ; N* k > 1}. Here, for each bootstrap 
decile group we use the same value of k that is used in the original estimation. These values are shown 
in Fig. 5. 

(iv) Fit the global linear trend simultaneously to all the bootstrap decile groups, by maximizing Eq. 4.8 
with i\) as a function of i? v * as in Eqs. 4.9 and 4.10. Hence, we obtain a bootstrap replication fi\ h . 

(v) Repeat steps (i)-(iv) B = 1000 times and estimate the standard error of $i by the sample standard 
deviation of the B replications 

f B 

s ^)= ^rE^-ffl 2 > ( 4 - n ) 

where fo=YLifc,b/ B - 

Table 2 summarizes the estimated parameters of the linear model for ip and their standard errors. Recall 
that our model assesses dependence through the value of ip. From these results, we conclude that there is a 
significant effect of the level of i? v in the dependence structure of (S,D), since $\ is significantly different 
from 0. 

We conclude that Eq. 4.9 jointly with Eq. 4.10 provide an adequate description of the behavior of ip 
across the decile groups. 

4.4. Parametric estimation of the spectral density of (R,S) and (R,D). We now transform the 
points Si)} and the points -Di)} in the 10th decile group using the previously described antirank 
transform. Figures 7(a) and 7(b) exhibit histograms of the transformed points {O^./t; Ni k > 1} corresponding 
to the pairs (R, S) and (R,D), respectively Both histograms look reasonably symmetric, and thus the 
modeling is done via the logistic family defined by Eq. 4.7. 

Figure 7 shows that the fitted logistic models are in close agreement with the empirical distribution of the 
points {<di,k' : N y k > 1}- Notice that the parameter tp of the logistic density corresponding to the pair (R, D) 
is closer to 1 than the parameter tp of the density corresponding to the pair (i?, S) . This suggests that for 
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the group of sessions with the highest values of i? v , the scheme RD (in which R and D are independent, at 
least asymptotically), is more adequate than the scheme RS (in which R and S arc independent, at least 
asymptotically). This conclusion is exactly the opposite to Sarvotham et al. [2005] 's, since they recommend 
using the scheme RS for the group with the highest peak rates (that is, their alpha group). 

The fact that for the sessions with the highest values of peak rate i? v , we have (R,D) close to asymp- 
totically independent may have the following interpretation. Users with high bandwidth pay little or no 
attention to the duration of their downloads; this is expected because such users know that probably their 
lines are capable of downloading any file, no matter how long it takes. 

5. Dependence structure of (S,D,R) when R does not have heavy tails 

We now investigate the dependence structure of (R, S) and (i?, D) in the first nine decile groups, that 
is, those with values of i? v in the decile ranges (10(<? — 1)%, 10<7%], g = 1, . . . , 9. For these groups, there is 
evidence that the distribution of R is not heavy tailed. Moreover, the diagnostics in Section 3.2.1 suggest that 
R £" T>(G 1 ) for any 7 £ M. However, the other variables S and D have heavy tails in these decile groups, and 
we can make use of the conditional extreme value model [Hcffcrnan and Tawn, 2004, Heffcrnan and Rcsnick, 
2007, Das and Resnick, 2008a,b] to study the dependence structure of the pairs (R,S) and (R,D). 

5.1. The conditional extreme value model. Classical bivariate extreme value theory assumes that both 
variables are in some maximal domain of attraction. When one variable is in a domain of attraction, but 
the other is not, the conditional extreme value model, or CEV provides a candidate model. 

Let Z = (X,Y) £ E = [O.oo] 2 \ {(0,0)} and let EW be the right closure of E^ = {y £ R : 1 + 7 y > 0}. 
The CEV model assumes that Y £ X>(G 7 ),7 £ R, with normalizing sequences a(t) > and b(t) as in Eq. 
3.4. In addition, the CEV model assumes that there exist functions a(t) > 0, f3(t) £ R and a non-null Radon 
measure fj, on the Borel subsets of [—00, 00] x E^ 7 * 1 such that the following conditions hold for any y £ E^: 

(i) For fi— continuity points (x,y): 

fX-pit) Y-bit) \ . , 

tP 7-7 — < x, 7- — > y -> Mr- 00 , x \ x [y, 00 1), t °°- 

V a W / 

(ii) fx([— 00, x] x (y, 00]) is not a degenerate distribution in x. 

(iii) 00, x] x (y, 00]) < 00. 

(iv) H{x) := //([— 00, x] x (0,oo]) is a probability distribution. 

The reason for the name CEV is that, assuming [x, 0) is a fj,— continuity point: 

X - p(t) 



< x Y > bit) — > H(x), t^oo. (5.1) 
a(t) J 

Therefore, Eq. 5.1 provides a way to study the dependence structure of the components of Z when only one 
is in a maximal domain of attraction. 




Figure 7. Logistic estimates in the 10th decile group superimposed on the histograms of 
the points {0^; > 1} Left Spectral density of (R, S) Right Spectral density of (R, D) 



18 



L. LOPEZ-OLIVEROS AND S.I. RESNICK 



5.2. Checking the CEV model. We now review a method for checking the adcquatcncss of the CEV 
model, recently developed by Das and Resnick [2008a]. Suppose {(Xi,Yi); 1 < i < n} are??? iid from the 
CEV model. Define: 

• Y(i) > • • • j ^(n) : The upper-order statistics of Yx, . . . , Y n . 

• X* , 1 < i < n: The X- variable corresponding to Y(i\, also called the concomitant of Yuy 

• r* k = £?=i 1[jc.<jc-]: The rank of X* among Xf, . . . ,X*. 
The Hillish statistic of {(Xj, Yi); 1 < i < n} is defined as 

1 

Hillish fe ,„ := - } log — log -. 

3=1 ■ y 

Under Hq : {(Xi,Yi);l < i < n} are iid from a CEV model, Das and Resnick [2008b] proved that as 
k — > oo, k/n — > 0, and n — > oo: 

Hillish M A 7 Miff , (5.2) 

where I^h is a constant that depends on fi and H defined in Section 5.1. 

Like the Hill estimator, the Hillish statistic depends on the number k 7 so we make a Hillish plot { (fc, Hillishfc^); k > 
1} and observe whether the plot has a stable regime. If that is the case, we conclude that the CEV model 
is adequate for (X, Y). 

5.3. Checking the CEV model for (R,S). The CEV model appears as a candidate model for (R, S) or 
(R, D) for any one of the lowest 9 decile groups, since for these groups R does not appear to be in a domain 
of attraction, while both S and D have heavy tails. We found that the CEV model is adequate for (R, S) 
within each of the lowest nine i? v -decile groups. Here we present our results. 

Figure 8 shows Hillish plots for checking the CEV model for (R, S) in the lowest nine decile groups. Apart 
from the second and third decile groups, all the plots look exceptionally stable. Although the plots do not 
look as good for the second and third decile groups, we still find a stable regime about the k = 800 upper 
order statistic, which supports the CEV model for these groups as well. This emphasizes that more detailed 
structure exists for the beta group of Sarvotham et al. [2005] and that further segmentation reveals more 
information. 

Moreover, observe that the limit constant I^h varies with the decile group. In effect, I^g decreases as 
i? v goes up. Since depends on the limit H in Eq. 5.1, this suggests that the conditional distribution of 
R given S varies with the decile group. Hence, the dependence structure of (S, D, R) depends on the explicit 
level of R v . 

In addition, the Hillish plots reject the CEV model for the pair (R,D) in all the decile groups. We have 
not displayed these plots. 

6. The Poisson property 

There is considerable evidence against the Poisson model as the generating mechanism for network traffic 
at the packet-level [Paxson and Floyd, 1995, Willinger et al., 1997, Willinger and Paxson, 1998, Hohn et al., 
2003]. However, the classical mechanism of Poisson arrival times, namely human activity generating many 
independent user connections to a server each with small probability of occurrence, is still present in the 
network traffic at higher levels. A significant example is provided by Park et al. [2006], which shows that 
"navigation bursts" in the server occur according to the Poisson model. 

Here we found that, although the Poisson model does not appear to activate the overall network traffic, 
it does initiate user sessions for any given group of sessions whose peak rate i? v is in a fixed inter-decile 
range. This allows for quite straightforward simulation within each decile group via a homogeneous Poisson 
process. 

Recall we split the sessions into 10 groups according to the deciles of R w . For any given decile group, 
suppose that Ti are the starting times of the user sessions in increasing order; if necessary, we relabel sessions 
within the group. Let Aj = r^+i — Ti be the session interarrival times. A homogeneous Poisson process is 
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Figure 8. Hillish statistic of (R, S), starting with the 1st decile group from the upper left 
and going by row 

characterized by {A^} being iid with the exponential exp(A) as the common distribution function, for some 
parameter A > 0. 

6.1. Checking the exponential distribution for interarrival times. We first check that {A^} may 
be accurately modeled as exponential random variables within each i? v decile group. As examples, Fig. 9 
Upper left and textitUpper right panels exhibit exponential QQ plots for {A^} for the 4th and 10th decile 
groups, respectively, which compare the quantiles of the empirical and theorical distributions. It is striking 
how well a straight line trend is shown, and this result replicates across all the decile groups. However, 
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Ordered Data 

Figure 9. Exponential QQ plots of the interarrival times of sessions Upper left 4th decile 
group Upper right 10th decile group Lower left overall traffic 

when all the sessions are put together in a single population, the session interarrival times have right tails 
noticeably heavier than exponential, as Fig. 9 Lower left shows. 

Interestingly, we found that the interarrival times within each decile group are not exponentially dis- 
tributed when rather than using the deciles of i? v for the segmentation, we use the deciles of any of the two 
previous predictors of burstiness, namely L$ and Kg. 

6.2. Checking the independence of interarrival times. Within each decile group, can we use the 

independence model for {A^}? We investigated this question with the sample autocorrelation function. The 
sample autocorrelation function (acf) of Ai, . . . , A„ at lag h is defined as 

_ Ett(A,-A)(A I+ ,-A) 
P() ~ Eti(Ai-A)3 

In Section 6.1, we showed {Ai} can be accurately modeled as an exponential random sample, and thus we 
can safely assume that the variances of Aj arc finite. Therefore, the standard L2 theory applies and Bartlett's 
formula from classical time series analysis [Brockwell and Davis, 1991] provides asymptotic normality of p(h) 
under the null hypothesis of independence, namely, 

y/np{h) ^N(0, 1). (6.1) 

Based on Eq. 6.1, we can test Hq : {A^} are independent by first determining the corresponding (upper) 
quantile z\- a of the normal distribution, and then plotting the sample acf as a function of the lag h. 
According to Eq. 6.1, approximately 1 — a of the points p{h) should lie between the bounds izi-an" 1 ' 2 , 
and, if so, there is no evidence against Hq. 

Figure 10 Left and Right panels exhibit sample acf plots for {A^} for the 4th and 10th decile groups. In 
each figure, we plot the confidence bounds for an a = 0.05. We counted 178 and 141 "spikes" coming out of 
those bounds, respectively. This represent less than 5% of the total of 4414. In general, we found that less 
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Figure 10. Sample autocorrelation functions of Aj Left 4th decile group Right 10th decile group 



than 5% of the spikes lie outside the bounds for all the decile groups. Based on the sample acf, there is no 
evidence against the independence of {A^} within each decile. 



7. Final remarks and conclusions 

For the purposes of illustration, we have presented our analysis on a data set publicly available as of 
May 2009 at http://pma.nlanr.net/Special/index.html through the National Laboratory for Applied 
Network Research (NLANR). The particular data file chosen for the analysis is "19991207-125019", which 
can be found within a collection of network data traces dubbed Auckland II recorded in 1999. We have 
successfully tested our analyses and proposed models given by Eqs. 4.7 and 4.9 for other data files in the 
collection Auckland II, as well as a more recent collection dubbed Auckland VIII recorded in 2003. 

Unfortunately, the NLANR website will be shutdown in May 2009. However, the Waikato Internet Traffic 
Storage or WITS (http://www.wand.net.nz/wits/) has expressed a hope to be able to make some of 
the NLANR's data sets (in particular, those corresponding to the Auckland's series) freely available for 
researchers to download in the near future. 

The reason for our choice of the logistic family and the linear trend is because they allow for a simple 
description of the dependence structure of (S, D) via the logistic parameter. As depicted in Figs. 5 and 6, 
the proposed logistic model defined by Eq. 4.7, jointly with the linear trend as in Eq. 4.9, does a sound job 
of explaining the dependence structure of (S, D) as a function of the explicit level of the peak rate R v . 

Our findings can yield more accurate simulation methods for network data. The following is an outline 
of a procedure to simulate network data traces: 

(1) Bootstrap from the empirical distribution of i? v and split the data into, say, 10 groups according to 
the empirical deciles. 

(2) Conditionally on the decile group, simulate the starting times V of the sessions via a homogeneous 
Poisson process. This means that the Poisson rate depends on the decile group; for example, from 
the original data set estimate the Poisson rates for each decile group and use them here. 

(3) From i? v , compute ip using the estimated linear trend as in Eq. 4.9 and use it to simulate an "angle" 

e. 

(4) Simulate the radial component N as a heavy tailed random variable, for instance the Pareto [Resnick, 
2007, de Haan and Resnick, 1993]. 

(5) Finally, transform (N, 6) to Cartesian coordinates in order to get (S, D) and compute R = S/D. 

We are considering details of a software procedure to implement this simulation suggestion. 

Our analyses can be readily extended to other segmentation schemes. For instance, heterogeneous traffic 
comprising different types of applications undoubtedly behaves differently from more homogeneous traffic, a 
fact used to justify the modeling in DAuria and Resnick [2008]. Our analyses should provide useful insights 
by investigating the dependence structure according to the application type. 

On another direction, we have shown evidence for the two following models: 
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• The classical extreme value theory for the pair (S, D), in which both components are heavy-tailed. 

• The conditional extreme value (CEV) model for the pair (R, S), in which only one component, 
namely S, is heavy-tailed. 

Given the fact that R = S/D, we are investigating the conditions on the CEV model for (R, S) that imply 
the classical model for (S, D), and vice versa. 
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